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ABSTRACT 

We describe the construction of MegaZ-LRG, a photometric redshift catalogue of 
over one million luminous red galaxies (LRGs) in the redshift range 0.4 < z < 0.7 
with limiting magnitude i < 20. The catalogue is selected from the imaging data 
of the Sloan Digital Sky Survey Data Release 4. The 2dF-SDSS LRG and Quasar 
(2SLAQ) spectroscopic redshift catalogue of 13,000 intermediate-redshift LRGs pro- 
vides a photometric redshift training set, allowing use of ANNz, a neural network-based 
k> photometric-redshift estimator. The rms photometric redshift accuracy obtained for an 

. evaluation set selected from the 2SLAQ sample is a z = 0.049 averaged over all galax- 

ies, and a z = 0.040 for a brighter subsample (i < 19.0). The catalogue is expected 
to contain ^5 per cent stellar contamination. The ANNz code is used to compute 
a refined star/galaxy probability based on a range of photometric parameters; this 
allows the contamination fraction to be reduced to 2 per cent with negligible loss of 
genuine galaxies. The MegaZ-LRG catalogue is publicly available on the World Wide 
Web from http : //www. 2slaq. info. 

Key words: surveys - catalogues - galaxies: distances and redshifts - cosmology: 
observations 



1 INTRODUCTION 

Galaxy redshift surveys have been a cornerstone amongst 
probes of the Universe since Hubble's discovery of the cos- 
mological expansion in 1929. Recent years have witnessed 
the construction of exquisitely detailed maps of the local 
(z ~ 0.1) Univer se by the 2-degree Field Galaxy Redshift 
Survey (2dFGRS: [Colless et al.ll200lD and the Sloan Digital 
Sky Survey fSDSS; lYork et alfcOOol h These surveys have de- 
fined the new state-of-the-art in spectroscopic detector tech- 
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nology, each constructing spectrographs capable of simulta- 
neous observation of hundreds of objects. However, further 
significant increases in the depth and area accessible to spec- 
troscopic redshift surveys will rely on the development of a 
new generation of instruments. 

Photometric redshifts, which are estimated from broad- 
band galaxy colours rather than spectra, offer an invalu- 
able interim solution. Relative to multi-object spectroscopy, 
high-quality photometry can be obtained far more quickly 
and for significantly fainter sources. Photometric redshift 
estimators are numerous, but generally involve calibration 
against either an observed spectroscopic training set (e.g. 
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Figure 1. Spectroscopic redshift distribution of the 2SLAQ LRG 
sample (13,115 galaxies), in redshift bins of width Az = 0.01. 



polynomial fitting. IConnollv et al.ll 19951 . or neural networks, 
iFirth. Lahav fc Somervillell2003l). or a small set of template 
spectra (e.g. HYPERz; iBolzonell a . Miralles fc Pelldl l200d : 
see also lBem'tez|[2OO0l . ICsabai et alj|2003h . The accuracy of 
photometric redshifts will never approach the precision pos- 
sible with spectroscopic redshifts, but the efficiency of the 
method allows vastly wider and deeper surveys to be con- 
ceived. 

This paper describes the construction of the MegaZ- 
LRG photometric redshift catalogue. MegaZ-LRG com- 
prises more than one million intermediate-redshift (0.4 < 
z < 0.7) luminous red galaxies (LRGs) selected from 
the imaging data of the SP SS Data Release 4 (DR4; 
lAdelman-McCarthv et alj|2006h . Lower redshift LRGs (z < 
0.45) are already targetted wi th the SPSS spectrog raph 
across the SPSS survey area (|Eisenstein et all 12001 ). By 
making use of the photometric redshift technique, MegaZ- 
LRG provides redshift information (albeit less accurate than 
that provided by spectroscopic study) for significantly more 
distant and for a far greater number of LRGs. LRGs are par- 
ticularly suited to the photometric re dshift technique due to 
the homogeneity of the population (|Eisenstein et al]|2003l ) 
and, especially, the prominence of the 4000A break in their 
spectra. 

The construction of the MegaZ-LRG catalogue has been 
facilitated by the recent com pletion of the 2dF-SP SS LRG 
and Quasar survey (2SLAQ; ICannon et al.1l200fj ). 2SLAQ 
combined the high-precision SPSS imaging with the ex- 
ceptional spectroscopic capabilities of the Two-degree Field 
(2dF) instrument on the 3.9-m Anglo- Australian Telescope 
to produce a spectroscopic redshift catalogue of ~13,000 lu- 
minous red galaxies in the redshift range 0.4 < z < 0.7 
(Figured]). 

By necessity, the 2SLAQ survey was restricted to a lim- 
ited number of fields located in the equatorial stripe of the 
SPSS survey area (Figure^. Applying the 2SLAQ LRG se- 
lection to the entire SPSS PR4 imaging area returns a sam- 
ple of over one million galaxies. The spectroscopic 2SLAQ 
catalogue constitutes a superb photometric redshift train- 
ing set for this sample. The value of such a training set 
is twofold: most importantly, it enables a detailed analy- 
sis of the photometric redshift error distribution, but it can 



also be used to calibrate the photometric redshift estimator. 
We make use of ANNz l|Collister fc Lahavll2004h . a neural 
network-based photometric redshift estimator, for which the 
existence of a well-representative training set is essential. 

The structure of this paper is as follows. In the next 
Section we describe the criteria used to select the MegaZ- 
LRG target sample from the SPSS PR4 imaging catalogue. 
In Section[3]we explain the ANNz photometric redshift tech- 
nique and evaluate the accuracy of the photometric redshifts 
obtained for the MegaZ-LRG catalogue. In Section [4] ANNz 
is used to refine the star/galaxy separation in the catalogue. 
Finally, Section[S]describes the MegaZ-LRG catalogue itself. 



2 TARGET SELECTION 

MegaZ-LRG i s ba se d on SPSS fiv e-band (ugriz; 
iFukugita et"aH Il99fj , ISmith et all I2002T ) im aging data 
obtai ned with a large format CCP camera (|Gunn et al.l 
ll99Sft mounted o n a special-purpose 2.5-m telescope 



( Gunn et al.l l2006t ) located at Apache Point Observatory 
in New Mexico. The photometric accuracy is on the order 
of a few percent, and the astrometric accuracy of th e 
object positions is approximately 0.1" JPier et al. 20031 ) . 
Technical details can be found in lYork et al. I (|2000ir and 
IStoughton et all (|2002l ). 



2.1 Selection criteria 

The neural-network technique for photometric redshift esti- 
mation relies on the training set being well-representative of 
the target sample. Our catalogue selection is therefore based 
directly on that of the 2SLAQ LRG sample. The 2SLAQ 
selection criteria for identifying LRGs at 0. 4 < z < 0.7 
are d escribed in the following subsections (see lCannon et all 
2006 for a more detailed explanation of the selection). All 
magnitudes are corrected for Gala ctic extinction following 
ISchlegel, Finkbeiner fc Pavislll998l . The various magnitude 
types are described in Table [1] 

Magnitude limits are motivated primarily by the need 
for sufficient flux for the 2dF spectrograph. The de Vau- 
couleurs magnitude provides the best measure of the total 
flux for faint LRGs. 



iflbrc < 21.4; 

17.5 < idov < 20.0. 



(1) 
(2) 



The 2SLAQ catalogue has high (~90 per cent) com- 
pleteness to id c v < 19.8, but drops off sharply beyond 
this limit (Figure [3}. In fact the nominal 2SLAQ flux limit 
is idcv = 19.8; the small number of objects fainter than 
this were obtained in an early observing run in which 
the flux limit was temporarily moved to idov = 20.0 (see 
ICannon et al.ll2~006l for details). 

Colour cuts are used to isolate the LRGs. All colours are 
calculated using model magnitudes; these provide unbiased 
colours since they are based on an identical aperture in every 
band. The colour selection is illustrated by Figure [4] 



0.5 < g -r < 3; 

r - i < 2; 



0.7(5 - r) + 1.2(r - i - 0.18) > 1.6; 



(3) 
(4) 
(5) 
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Figure 2. Map of MegaZ-LRG sample (small points covering the entire SDSS DR4 area), and 2SLAQ fields (black regions centred 
on equator). For clarity, only 50,000 randomly-selected MegaZ-LRG galaxies are shown. Equal-area Aitoff projection of equatorial 
coordinates. 



Table 1. Definitions of the different types of SDSS magnitude used in the MegaZ-LRG selection. 



Type of magnitude Definition 

psf magnitude Magnitude corresponding to the best fit of the point-spread function 

at the galaxy position. Useful for star-galaxy separation, 
de Vaucouleurs magnitude Magnitude corresponding to the best fit of a de Vaucouleurs profile. 

The best estimate of the total flux for faint LRGs. 
exponential magnitude Magnitude corresponding to the best fit of an exponential profile, 

model magnitude Uses the best fit of a de Vaucouleurs or exponential profile in the r-band, 

with the amplitude scaled to fit measurements in other filters. 

This is the best estimator of the colour of the galaxy, because the same 

aperture is used for all the filters, 
fibre magnitude The flux contained within the aperture of a spectroscopic fibre (3" in diameter). 



dperp = (r - i) - (g - r)/8.0 > 0.5. (6) 

The selection on c par separates later-type galaxies from 
the LRG sample. Cuts above lines of constant d per p se- 
lect early- type galaxies with increasingly high redshift. The 
main 2SLAQ sample is defined by d p0 rp > 0.55, but a small 
number of (lower-redshift) LRGs were observed below this 
boundary during the initial observing runs. 

Effective star-galaxy separation is performed using the 
following criteria (see also Section |4)|: 

ipsf — imodd > 0.2 (21.0 — idov); (7) 
i-band de Vaucouleurs radius> 0.2". (8) 

Note that the SDSS star-galaxy classification is not used. 

Some final technical requirements: "Detected" in both r 
and i; nchild = 0; not SATURATED in any band; not N0_PETR0 
in r or i. 

These selection criteria are extremely effective at iden- 
tifying LRGs in the redshift range of interest: 95 per cent of 
the objects targeted by 2SLAQ are bona fide intermediate- 
redshift LRGs. The most significant contaminant, account- 
ing for virtually all of the remaining 5 per cent, are M-type 
stars. These cannot be trivially separated from the LRGs us- 
ing gri colours, and the small angular diameters subtended 
by galaxies at these distances mean that it is difficult to dis- 
tinguish between LRGs and point-spread functions morpho- 
logically. In Section [4] we make use of additional photomet- 



ric parameters to derive an enhanced, neural network-based 
star/galaxy separation flag. 



2.2 The photometric target sample 

The MegaZ-LRG photometric sample is selected from the 
SDSS DR4 imaging catalogue using the criteria exactly as 
specified above. Only PRIMARY objects are included in order 
to omit duplicate observations. Submitting the selection cri- 
teria to the SDSS DR4 catalogue returns 1,214,117 objects. 
The de Vaucouleurs magnitude and model colour distribu- 
tions of the sample are compared with those of the 2SLAQ 
LRG catalogue in Figures [3] and [4] 



2.3 The training set 

The final 2SLAQ LRG catalogue provides reliable spectro- 
scopic redshifts for 13,768 unique objects. In order to en- 
sure consistency between the training and target samples, 
we obtained the DR4 photometry for the 2SLAQ objects by 
coordinate-matching against our photometric target sample. 
Confident matches were found for 13,139 objects; changes 
introduced into the SDSS photometric pipeline since the 
2SLAQ targets were selected mean that the remainder now 
fail the selection. 
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Figure 3. Distribution of i-band de Vaucouleurs magnitude for 
the MegaZ-LRG target sample (dotted line) and the 2SLAQ 
training set (histogram). The 2SLAQ histogram is scaled so that 
the two distributions have equal integrated area in the region 
*dcV < 19.8. The bin size is «d e v = 0.01. 
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Figure 4. Colour distributions of 2SLAQ LRGs (squares) and 
MegaZ-LRG targets (small points). For clarity, only a randomly- 
selected subset of each catalogue is shown. 

3 PHOTOMETRIC REDSHIFTS 
3.1 ANNz 

The ANNz photome tric redshift cod<£[| l|Collister fc Lahavl 
12004 ICollisteill2006| ) is based on neural networks. In com- 
mon with other "empirical" photometric redshift estima- 
tors, it relies on the existence of a training set of objects 
with spectroscopic redshifts. This sample should be rep- 
resentative of the target photometric sample in terms of 
magnitude and colour-space distributions. Given a well- 
matched training set, the neural-network method is highly 
competitive with commonly-used photometric redshift es- 
timators: it is less prone to systematic errors than the 
SED-fitting approach, and is found to provide the great- 
est accuracy amongst similar trainin g-set-based methods 
i|Firth et all 120031 : ICsabai et al.ll2005t ). The primary draw- 
back of the method is the need for observational training 
data, which can be expensive to obtain. One can only apply 

1 The ANNz software package may be obtained from 
http : //www. star .ucl . ac . uk/ ~lahav/annz .html 



the trained network to target objects which lie within the 
parameter space sampled by the training data. The tech- 
nique is, therefore, not well-suited to the traditional use of 
photometric redshifts at faint magnitudes, where obtaining 
a sufficiently large training set is likely to be impossible. 
Rather, its strength is in producing very large redshift sam- 
ples from the combination of a modest spectroscopic survey 
and a much wider photometric sample. 

The artificial neural network (ANN) is in essence a 
highly-flexible, fully non-linear fitting function. The inputs 
to the function are the photometric parameters (usually the 
galaxy magnitudes in each of a range of filters), and the 
output is the redshift. The ANN function incorporates a 
number of free parameters known as weights; these are op- 
timised (the network is "trained") using the training set. 
The training process involves minimising a "cost function" : 
essentially the sum over the training set of the squared differ- 
ences between the photometric and spectroscopic redshifts. 
The number of free parameters is controlled by the network 
architecture. iFirth et al.l (|2003l ) investigate the influence of 
network architecture on performance. For the same number 
of parameters, adding extra hidden layers is found to give 
greater gains than widening existing layers. As the network 
complexity is increased, the accuracy eventually converges 
so that no further improvement is gained by adding addi- 
tional nodes. The network architectures used in the following 
applications are chosen (by trial-and-error) to be sufficiently 
complex for such convergence to be achieved. Note that the 
cost function includes a "weight decay" term that prevents 
weights becoming large unless they contribute a significant 
improvement to the performance of the network. 

In practice, the available training data is subdivided 
into "training" and "validation" samples. Only the training 
set is directly used to train the network, but at each itera- 
tion of the minimiser the cost function is also evaluated on 
the validation set. This prevents over-fitting to the training 
data, by halting the training process once convergence is ob- 
served for the validation set. Once training is complete, the 
target galaxies are submitted to the network in turn, and 
the output from the network in each case is assigned as the 
photometric redshift. The accuracy is improved by indepen- 
dently training a number of networks (on the same training 
data) and using the mean of their outputs as the photometric 
redshift for each target galaxy. However, for a well-matched 
training set (as we have here) the gain in accuracy when us- 
ing such a "committee" is usuall y minor. Full detai l s of th e 
ANNz software may be found in lCollister fc Lahavl l|2004 ). 

3.2 Photometric redshift evaluation for the 
2SLAQ sample 

Although the photometric target sample is expected to con- 
tain stellar contamination, we remove stars from the 2SLAQ 
sample (using their spectroscopic identification) before any 
photometric redshift estimation. As a result, ANNz does not 
recognise stars at all and assigns them extragalactic red- 
shifts, but our tests show that the photometric redshift accu- 
racy for the genuine galaxies is optimised by this approach. 
The stellar contamination is discussed in more detail in Sec- 
tion U Removing the stars reduces the 2SLAQ sample to 
12,515 objects. 

In order to allow the photometric redshift accuracy to 
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be assessed objectively, we separate out 8,515 2SLAQ ob- 
jects at random to be used solely as an evaluation set. Dur- 
ing the evaluation phase only the remaining 4,000 objects 
are used for training ANNz, and the evaluation set is treated 
strictly as a mock target sample. Figures [3] and [4] show that 
the evaluation set may be considered to be accurately rep- 
resentative of our photometric target sample (although cau- 
tion is exercised at idov > 19.8 and d pC rp < 0.55 where there 
are relatively few training objects; see Section r3.2.ip . We find 
that a 4,000-member training set is large enough to ensure 
convergence in terms of the photometric redshift accuracy: 
virtually identical results are obtained using a training set 
8,000 members, but reducing the training set to 2,000 mem- 
bers results in a ~ 5 per cent increase in the overall photo- 
metric redshift error. 

For photometric redshift estimation we use the model 
magnitudes in griz as the inputs to ANNz. The model mag- 
nitudes are preferred due to their use of equal apertures in 
each band; this allows unbiased colour estimates, crucial to 
accurate photometric redshifts. We do not make use of the u 
band primarily due to concern over a time- varying red leak 
in this filter (SDSS DR4 web site) that could introduce sys- 
tematic coordinate dependence into the photometric redshift 
errors. Irrespective of this concern, the low signal to noise 
in the u band relative to the other SDSS filters for LRGs 
means that it is found to contribute no measurable benefit 
to the photometric redshift accuracy. 

An ANNz c ommittee of four networ ks with a 4:10:10:1 
architecture fsee lCollister fc Lahavi[2004l for an explanation 
of this notation) was trained using the 4,000 object train- 
ing set (split equally into training and validation subsets). 
Each member of the 8,515 object evaluation set was then 
submitted to the trained committee in order to obtain the 
photometric redshifts. 

The photometric redshifts for the evaluation set are 
plotted against the spectroscopic redshifts in Figure [5] 
The photometric redshift accuracy is characterised by the 
bias,(<5z), where 

8z — Zphot Z s p cc (9) 

and the dispersion 

a 2 z = ((Szf)-(Sz) 2 , (10) 

or with the inevitable loss of accuracy due to the stretching 
of the spectrum at increased redshift factored out, 




Note that these are purely statistical measures which can 
only be calculated for the evaluation set. 

For the evaluation set, the average photometric redshift 
error is a z = 0.0488 or oo = 0.0320, similar to that ob tained 
for the same sample by iPadmanabhan et al.l ((2005) . Fig- 
ure [6] shows how the photometric redshift error varies with 
Zphot: the ANNz photometric redshift is seen to be an unbi- 
ased estimator for z spec over the range 0.45 < z p hot < 0.65, 
with \{8z)\ < 10~ 3 , and the dispersion increases with red- 
shift by no more than the expected factor of (1 + z). Very 
few objects are assigned z pbot < 0.45, and the average pho- 
tometric redshift in this region is biased to higher values. 
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Figure 5. Photometric redshift accuracy for the LRG evaluation 
set described in Section l3.2l The dashed lines show 2 sp cc = z p hot 
and the interval ±(7o(l + z p hot)- The solid lines show the mean 
and standard deviation of z Bpcc as a function of 2 p hoti evaluated 
in bins of z phot = 0.02. 



At Zphot > 0.65 the dispersion noticeably worsens and the 
average photometric redshift is positively biased. 

It is possible, using ANNz, to estimate the contribu- 
tion to the error bud get originating from the photometric 
noise (as described in ICollister fc Lahavll2004l ). Unsurpris- 
ingly, this contribution increases with redshift: at jz spcc < 0.5 
the photometric noise is responsible for an average photo- 
metric redshift uncertainty of a z = 0.021, but this increases 
to (j z = 0.031 for objects at z spcc > 0.6. We also note that 
the average scatter between the outputs of the individual 
ANNs in the committee is 8 x 10 -4 . 

Figure [7] shows the photometric redshift residuals as 
a function of the spectroscopic redshift, and makes it clear 
that the photometric redshift distribution is skewed with re- 
spect to the spectroscopic redshifts. Objects with lower spec- 
troscopic redshifts tend to be assigned higher photometric 
redshifts on average, and objects with higher spectroscopic 
redshifts are assigned lower photometric redshifts. Note that 
this bias is not relevant when using the photometric redshifts 
to assign galaxies to bins: the mean spectroscopic redshift in 
a photometric redshift-selected bin will be close to the cen- 
tral photometric redshift in that bin, and the distribution 
around the mean is approximately symmetric. 

Regardless of these general observations, it is critical 
in any application of the catalogue that a careful, individ- 
ual assessment of the impact of the photometric redshift 
errors is made. As an example, we show the spectroscopic 
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Figure 6. Photometric rcdshift residuals versus photometric red- 
shift. The dashed lines show the interval ±<to(1 + £ p hot) an d the 
solid lines show the standard deviation of z p hot — Zspec- 
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Figure 7. As Figure [6] but showing the photometric rcdshift 
residuals versus spectroscopic redshift. 

redshift distributions in photometric redshift-selected bins 
of width Azphot = 0.05 (Figure [8]). Gaussian fits to these 
distributions are overplotted, and the mean and variance 
of the fits are given in Table [2] The Gaussian provides a 
reasonable fit to the distribution in each of the bins. How- 
ever, particularly in the 0.45 < z p hot < 0.50 bin, the actual 
distribution has a tighter core and st r onger wings than the 
fitted Gaussian. IPadmanabhan et al.l |2005l ) obtain similar 
distributions from their independent photometric redshift 
estimation. Analyses based on these bins should take care 
that this non-Gaussianity is taken into account: ideally one 




Figure 8. Spectroscopic redshift distributions in photometric 
redshift-selected bins of width AZp^ot = 0.05. 



Table 2. Mean and variance of Gaussian distributions fitted to 
the spectroscopic redshift distributions in photometric redshift 
selected bins (Figure [5J 









A< 


a 


0.45 < 


-^phot 


< 0.50 


0.477 


0.036 


0.50 < 


-2-phot 
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0.524 


0.041 


0.55 < 
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< 0.60 


0.571 


0.043 


0.60 < 


^•phot 


< 0.65 


0.620 


0.054 



should use the measured n(z spcc ) directly, rather than at- 
tempting to parameterize the distribution. The evaluation 
set photometric and spectroscopic redshifts have been made 
available with the MegaZ-LRG catalogue for this specific 
purpose. 

It is important to note that these spectroscopic redshift 
distributions are not expected to be appropriate to samples 
selected from the complete MegaZ-LRG catalogue; they may 
only be used if cuts are applied at id c v = 19.8 and d perp = 
0.55 to bring the MegaZ-LRG selection into line with the 
2SLAQ sample (see Section \2.1\ . 

Figure [9] displays the dependence of the photometric 
redshift accuracy on the i-band model magnitude. The red- 
shift error increases steadily until i mo dei ~ 20, beyond which 
the accuracy degrades much more rapidly as the limiting 
magnitude is approached. The photometric redshift accu- 
racy for the brightest objects is significantly better than the 
average: considering only objects with i mo dci < 19.0, we find 
a z = 0.0400 and a = 0.0276. 

3.2.1 Accuracy in low- completeness training regions 

As explained above, the 2SLAQ catalogue suffers from low 
completeness towards certain limits of the parameter space 
defined by the cuts in Section [2] The regions of interest 
can be seen in Figures [3] and [3] to host large numbers of 
target objects, but relatively few training objects. We now 
assess whether the scarcity of training data in these regions 
adversely impacts the photometric redshift accuracy. 

We first examine the photometric redshift errors for ob- 
jects having 19.8 < idev < 20.0. In fact, as Figure [101 shows . 
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Figure 9. Dependence of the photometric redshift accuracy on 
the i-band model magnitude. Points show the individual redshift 
errors for galaxies in the evaluation set. The solid line shows a z 
and the dashed line shows oq (which may be considered a fairer 
measure since it corrects for the different redshift distributions of 
samples defined by different magnitude ranges). 
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Figure 10. As Figure[9]but showing the dependence of the pho- 
tometric redshift accuracy on the i-band de Vaucouleurs magni- 
tude. The low sensitivity of a to id e v may be attributed to the 
significant stochasticity between i^ev and i mo del (Figure [IT) - To 
allow more direct comparison, the dotted line shows a z for i mo d e i- 
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Figure 11. Comparison of the i-band de Vaucouleurs and model 
magnitudes for the MegaZ-LRG sample. 

A similar concern applies to the d perp < 0.55 region. 
The photometric redshift error evaluated only on these ob- 
jects is a z — 0.051, (To = 0.033; a little higher than the 
average but still an acceptable error level. Note that d pe r P 
is an excellent redshift indicator: lower d perp implies lower 
redshift. These 0.50 < d pcrp < 0.55 objects typically have 
z Bpcc ~ 0.45. 

3.3 Catalogue photometric redshifts 

The photometric redshifts for the final MegaZ-LRG cata- 
logue were estimated using a new ANNz committee of four 
4:10:10:1 networks, trained on the entire 2SLAQ catalogue 
(except for the spectroscopically- identified stars): 12,515 ob- 
jects, equally assigned to the training and validation subsets. 

Figure[T2]shows the photometric redshift distribution of 
the MegaZ-LRG catalogue together with both the spectro- 
scopic and photometric redshift distributions of the 2SLAQ 
evaluation sample. The MegaZ-LRG catalogue contains con- 
siderably more objects around z p hot ~ 0.45 than the 2SLAQ 
evaluation sample. This is due to the admittance of objects 
having 0.50 < d pcrp < 0.55: there are 228,520 such objects 
in the catalogue, hence the significant boost to numbers at 
lower redshifts. If these objects are removed, the photomet- 
ric redshift distributions of the MegaZ-LRG catalogue and 
2SLAQ evaluation set are very similar. 



the photometric redshift accuracy in this regime shows no 
suggestion of deterioration. For the faintest objects, the pho- 
tometric redshift accuracy is a much stronger function of 
the model magnitudes since it is these which are passed to 
ANNz and used to compute galaxy colours. The significant 
stochasticity between the i-band de Vaucouleurs and model 
magnitudes (Figure ITT)) ensures that this dependence is not 
propagated to id c v, and the photometric redshift accuracy 
in 19.8 < «dcv < 20.0 is close to the average for the sam- 
ple as a whole. Note that i mo dei is determined using the 
best-fitting profile in the r-band, whereas id c v uses the de 
Vaucouleurs model fitted in the i-band; these are not equal, 
in general, even when th e model magnitude is based on a de 
Vaucouleurs profile . See lYork et al.l (|2000T ) for details. 



4 ENHANCED STAR/GALAXY SEPARATION 

Neural networks have previously been s uccessfully applied 
to the star/galaxy separation problem bv lBertin fc Arnoutsl 
(1996), and the ANNz photometric redshift code can be ef- 
fectively and straightforwardly applied to the task. We have 
trained ANNz to perform star/galaxy separation using the 
2SLAQ catalogue as a training set. Instead of using the spec- 
troscopic redshift as the target network output, we define 
<5 sg , such that <5 sg = 1 if the training object is a galaxy, 
and <5 sg = if it has been spectroscopically identified as a 
star. The output of an ANN trained to predict <5 sg will be a 
continuous quantity; it can be shown that this output may 
be interpreted as the classification probability for the par- 
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Figure 12. Redshift distributions: (dotted histogram) photomet- 
ric redshift distribution of the MegaZ-LRG catalogue; (solid his- 
togram) photometric redshift distribution of the MegaZ-LRG cat- 
alogue including only objects with d pcIp > 0.55; (dashed line) 
spectroscopic redshift distribution of the 2SLAQ evaluation set; 
(dot-dashed line) photometric redshift distribution of the 2SLAQ 
evaluation set. The evaluation set distributions are normalised to 
have the same integrated area as the MegaZ-LRG d pcTp > 0.55 
histogram. The histograms have bin width Az = 0.01. 



ticular target object. The closer <5 sg is to 1, the higher the 
probability that the particular target object is a galaxy. 

The ANN method has two especially attractive advan- 
tages: (i) we can allow the network to consider as many 
parameters as we believe may be relevant to the problem of 
star/galaxy separation, and (ii) we do not need to construct 
ad hoc criteria such as those of Section [2] but can simply 
leave it to the network to determine the optimal classifica- 
tion scheme. 

We selected 15 SDSS photometric parameters to be used 
as inputs to the star/galaxy classifying ANN; these are listed 
in Table [3] They include the object's magnitude in each of 
the SDSS griz filters (the u band is not used for the rea- 
sons outlined in Section |3J, along with a number of parame- 
ters describing the angular size and the distribution of light 
within the object. Other parameters were considered, in par- 
ticular the Petrosian magnitudes and radii, but these were 
found to result in negligible improvement in the separation. 
As in Section|3J we separated out an evaluation set of 9,139 
objects, now selected from 13,139 2SLAQ objects since the 
stars are included. A committee of four 15:20:20:1 networks 
was trained to predict <5 sg using the remaining 4,000 objects 
as the training set. The trained committee was then applied 
to the evaluation set, in order to obtain the predicted galaxy 
probability for each object. The initial stellar contamination 
fraction in both the training and evaluation samples was 5 
per cent. 

In order to perform separation one must decide on a 
threshold probability for admittance to the galaxy sample. 
Increasing this threshold leads to more aggressive removal 
of stars, but may also cause more genuine LRGs to be dis- 
carded. Figure [13] shows the effect of varying the admittance 
threshold on the contamination level in the evaluation set. 
A conservative suggestion would be to adopt a threshold 
galaxy probability of 0.2: this is expected to reduce the ex- 
pected stellar contamination to 2 per cent, with the loss of 



Table 3. Photometric parameters used as inputs to ANNz for 
star/galaxy separation. Apart from the dereddened model mag- 
nitudes and the SDSS type classification, all parameters are mea- 
sured in the i band, since these red objects exhibit greatest signal- 
to-noise in this filter. Note that only the four dereddened model 
magnitudes were used during the photometric redshift estimation. 



Parameter 


Description 


derecLg 




derecLr 


Dereddened model magnitudes 


derecLi 




dered_z 




psf Mag_i 


PSF flux (dereddened) 


f iberMag_i 


Flux in 3"diameter fibre radius (dereddened) 


deVMag_i 


De Vaucouleurs magnitude (dereddened) 


expMag_i 


Exponential fit magnitude (dereddened) 


deVRad.i 


De Vaucouleurs fit scale radius 


deVAB.i 


De Vaucouleurs fit axis ratio 


expRad_i 


Exponential fit scale radius 


expAB_i 


Exponential fit axis ratio 


lnLStar_i 


Star ln(likelihood) 


lnLExp_i 


Exponential disk fit ln(likelihood) 


InLDeV.i 


DeVaucouleurs fit ln(likelihood) 



only 0.1 per cent of the genuine galaxies. Alternatively, a 
more agressive threshold of 0.8 reduces the expected stellar 
contamination to just 0.5 per cent, but still preserves all but 
~1 per cent of the genuine galaxies. 

To obtain <5 sg values for each of the objects in the 
MegaZ-LRG catalogue, a new committee of four 15:20:20:1 
networks was trained, now using the entire 2SLAQ cata- 
logue as the training set (13,139 objects split equally into 
training and validation subsets). Applying an admittance 
threshold of 0.2 to the MegaZ-LRG catalogue reduces its 
size to 1,190,682 objects. Note that the MegaZ-LRG cata- 
logue has not had any such cut applied, but <5 sg is provided 
for each object. Star/galaxy separation may be performed 
by discarding objects having 5 sg less than one's preferred 
threshold (guided by Figure H3). 



5 SUMMARY: THE MEGAZ-LRG 
CATALOGUE 

We have selected a photometric sample from the SDSS DR4 
imaging catalogue using the criteria devised for the 2SLAQ 
LRG survey. This MegaZ-LRG catalogue contains 1,214,117 
objects in total. Luminous red galaxies are expected to com- 
prise ~ 95 per cent of the catalogue membership, with the 
remaining ~5 per cent dominated by M-type stars. 

The 2SLAQ spectroscopic catalogue of ~13,000 LRGs 
was used to train the ANNz photometric redshift code. For 
each of the catalogue objects, photometric redshifts were 
estimated based on the dereddened griz model magnitudes. 
The rms photometric redshift error computed for an evalu- 
ation set selected from the 2SLAQ sample is a z = 0.049, or 
ao = 0.030. 

ANNz was separately trained to perform star/galaxy 
separation based on a set of 15 photometric parameters. A 
star/galaxy flag, 5 sg , was estimated for each of the catalogue 
objects; this continuous parameter may be interpreted as the 
probability that a particular object is a galaxy rather than 
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Figure 13. Effect of varying the ANNz star/galaxy separation 
threshold. The solid line shows the number of stars passing the cut 
as a fraction of the total number of objects admitted. The dashed 
line shows the fraction of genuine galaxies which are discarded, as 
a fraction of the total number of genuine galaxies in the original 
sample. 



Table 4. Parameters included in the MegaZ-LRG photometric 
redshift catalogue. 



objID 


SDSS objID 


ra 


J2000 right ascension 


dec 


J2000 declination 


derecLu 




dered_g 




dered_r 


Dereddened model magnitudes 


dered_i 




dered_z 




deVMag_i 


Dereddened de Vaucouleurs magnitude 


z_phot 


ANNz photometric redshift 


delta_sg 


ANNz galaxy probability 



a star. The aggressiveness of star/galaxy separation may be 
varied through the choice of threshold imposed on the 5 SS 
parameter. 

The MegaZ-LRG catalogue may be obtained from 
http://www.2slaq.info. For each of the 1,214,117 objects 
in the catalogue we provide the photometric redshift and 5 sg 
parameter calculated as described above. To allow any of the 
full range of SDSS photometric parameters to be straightfor- 
wardly obtained we also include each object's SDSS objID. 
A basic set of photometric parameters are included with the 
catalogue for convenience (Table [4}. An example listing of 
six objects from the MegaZ-LRG catalogue is provided in 
Tabled] 

Measurements of large-scale structure within photomet- 
ric redshi ft slices in t he M egaZ-LRG catalogue are pre- 
sented in iBlake et alJ ll2 006). An independent analysis by 
IPadmanabhan et alJ (|2006l L based on a similar sample, has 
produced consistent results . A stud y of non-linear clustering 
within MegaZ-LRG (|Collisterll20"06l ) is to appear shortly, and 
the MegaZ-LRG catalogue is due to be extended to make use 
of the additional area provided in the SDSS Data Release 5. 
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Table 5. Example extract of six objects from the MegaZ-LRG catalogue. 



objID 


ra 


dec 


dered_u 


dered_g 


dered_r 


dered_i 


dered_z 


deVMag.i 


z_phot 


delta_sg 


587722952230174879 


236.309 


-0.430 


22.482 


21.051 


19.339 


18.540 


18.119 


18.392 


0.455 


0.987 


587722952230175340 


236.338 


-0.579 


22.864 


22.350 


21.209 


19.972 


19.584 


19.781 


0.644 


0.987 


587722952230175431 


236.246 


-0.449 


22.171 


22.449 


21.089 


19.965 


19.643 


19.613 


0.600 


0.998 


587722952230175557 


236.299 


-0.573 


24.406 


21.805 


20.501 


19.714 


19.184 


19.955 


0.514 


0.266 


587722952230175583 


236.307 


-0.436 


24.250 


22.456 


20.886 


19.793 


19.613 


19.956 


0.570 


0.999 


587722952230175590 


236.311 


-0.513 


22.855 


22.185 


20.872 


20.099 


19.720 


19.973 


0.513 


0.999 
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